Optimizing segmentation granularity for neural machine translation
نویسندگان
چکیده
منابع مشابه
Optimizing Chinese Word Segmentation for Machine Translation Performance
Previous work has shown that Chinese word segmentation is useful for machine translation to English, yet the way different segmentation strategies affect MT is still poorly understood. In this paper, we demonstrate that optimizing segmentation for an existing segmentation standard does not always yield better MT performance. We find that other factors such as segmentation consistency and granul...
متن کاملTarget-side Word Segmentation Strategies for Neural Machine Translation
For efficiency considerations, state-of-theart neural machine translation (NMT) requires the vocabulary to be restricted to a limited-size set of several thousand symbols. This is highly problematic when translating into inflected or compounding languages. A typical remedy is the use of subword units, where words are segmented into smaller components. Byte pair encoding, a purely corpus-based a...
متن کاملMorpheme-Aware Subword Segmentation for Neural Machine Translation
Neural machine translation together with subword segmentation has recently produced state-of-the-art translation performance. The commonly used segmentation algorithm based on byte-pair encoding (BPE) does not consider the morphological structure of words. This occasionally causes misleading segmentation and incorrect translation of rare words. In this thesis we explore the use of morphological...
متن کاملOptimizing Segmentation Strategies for Simultaneous Speech Translation
In this paper, we propose new algorithms for learning segmentation strategies for simultaneous speech translation. In contrast to previously proposed heuristic methods, our method finds a segmentation that directly maximizes the performance of the machine translation system. We describe two methods based on greedy search and dynamic programming that search for the optimal segmentation strategy....
متن کاملOptimizing sentence segmentation for spoken language translation
The conventional approach in text-based machine translation (MT) is to translate complete sentences, which are conveniently indicated by sentence boundary markers. However, since such boundary markers are not available for speech, new methods are required that define an optimal unit for translation. Our experimental results show that with a segment length optimized for a particular MT system, i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Translation
سال: 2020
ISSN: 0922-6567,1573-0573
DOI: 10.1007/s10590-019-09243-8